486 research outputs found

    Mixed-rates asymptotics

    Full text link
    A general method is presented for deriving the limiting behavior of estimators that are defined as the values of parameters optimizing an empirical criterion function. The asymptotic behavior of such estimators is typically deduced from uniform limit theorems for rescaled and reparametrized criterion functions. The new method can handle cases where the standard approach does not yield the complete limiting behavior of the estimator. The asymptotic analysis depends on a decomposition of criterion functions into sums of components with different rescalings. The method is explained by examples from Lasso-type estimation, kk-means clustering, Shorth estimation and partial linear models.Comment: Published in at http://dx.doi.org/10.1214/009053607000000668 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    The Discrete Dantzig Selector: Estimating Sparse Linear Models via Mixed Integer Linear Optimization

    Full text link
    We propose a novel high-dimensional linear regression estimator: the Discrete Dantzig Selector, which minimizes the number of nonzero regression coefficients subject to a budget on the maximal absolute correlation between the features and residuals. Motivated by the significant advances in integer optimization over the past 10-15 years, we present a Mixed Integer Linear Optimization (MILO) approach to obtain certifiably optimal global solutions to this nonconvex optimization problem. The current state of algorithmics in integer optimization makes our proposal substantially more computationally attractive than the least squares subset selection framework based on integer quadratic optimization, recently proposed in [8] and the continuous nonconvex quadratic optimization framework of [33]. We propose new discrete first-order methods, which when paired with state-of-the-art MILO solvers, lead to good solutions for the Discrete Dantzig Selector problem for a given computational budget. We illustrate that our integrated approach provides globally optimal solutions in significantly shorter computation times, when compared to off-the-shelf MILO solvers. We demonstrate both theoretically and empirically that in a wide range of regimes the statistical properties of the Discrete Dantzig Selector are superior to those of popular â„“1\ell_{1}-based approaches. We illustrate that our approach can handle problem instances with p = 10,000 features with certifiable optimality making it a highly scalable combinatorial variable selection approach in sparse linear modeling

    Improved variable selection with Forward-Lasso adaptive shrinkage

    Full text link
    Recently, considerable interest has focused on variable selection methods in regression situations where the number of predictors, pp, is large relative to the number of observations, nn. Two commonly applied variable selection approaches are the Lasso, which computes highly shrunk regression coefficients, and Forward Selection, which uses no shrinkage. We propose a new approach, "Forward-Lasso Adaptive SHrinkage" (FLASH), which includes the Lasso and Forward Selection as special cases, and can be used in both the linear regression and the Generalized Linear Model domains. As with the Lasso and Forward Selection, FLASH iteratively adds one variable to the model in a hierarchical fashion but, unlike these methods, at each step adjusts the level of shrinkage so as to optimize the selection of the next variable. We first present FLASH in the linear regression setting and show that it can be fitted using a variant of the computationally efficient LARS algorithm. Then, we extend FLASH to the GLM domain and demonstrate, through numerous simulations and real world data sets, as well as some theoretical analysis, that FLASH generally outperforms many competing approaches.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS375 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    The Discrete Dantzig Selector: Estimating Sparse Linear Models via Mixed Integer Linear Optimization

    Get PDF
    We propose a novel high-dimensional linear regression estimator: the Discrete Dantzig Selector, which minimizes the number of nonzero regression coefficients subject to a budget on the maximal absolute correlation between the features and residuals. Motivated by the significant advances in integer optimization over the past 10-15 years, we present a mixed integer linear optimization (MILO) approach to obtain certifiably optimal global solutions to this nonconvex optimization problem. The current state of algorithmics in integer optimization makes our proposal substantially more computationally attractive than the least squares subset selection framework based on integer quadratic optimization, recently proposed by Bertsimas et al. and the continuous nonconvex quadratic optimization framework of Liu et al. We propose new discrete first-order methods, which when paired with the state-of-the-art MILO solvers, lead to good solutions for the Discrete Dantzig Selector problem for a given computational budget. We illustrate that our integrated approach provides globally optimal solutions in significantly shorter computation times, when compared to off-the-shelf MILO solvers. We demonstrate both theoretically and empirically that in a wide range of regimes the statistical properties of the Discrete Dantzig Selector are superior to those of popular ell1-based approaches. We illustrate that our approach can handle problem instances with p =10,000 features with certifiable optimality making it a highly scalable combinatorial variable selection approach in sparse linear modeling

    COVID-19 second wave mortality in Europe and the United States

    Full text link
    This paper introduces new methods to analyze the changing progression of COVID-19 cases to deaths in different waves of the pandemic. First, an algorithmic approach partitions each country or state's COVID-19 time series into a first wave and subsequent period. Next, offsets between case and death time series are learned for each country via a normalized inner product. Combining these with additional calculations, we can determine which countries have most substantially reduced the mortality rate of COVID-19. Finally, our paper identifies similarities in the trajectories of cases and deaths for European countries and U.S. states. Our analysis refines the popular conception that the mortality rate has greatly decreased throughout Europe during its second wave of COVID-19; instead, we demonstrate substantial heterogeneity throughout Europe and the U.S. The Netherlands exhibited the largest reduction of mortality, a factor of 16, followed by Denmark, France, Belgium, and other Western European countries, greater than both Eastern European countries and U.S. states. Some structural similarity is observed between Europe and the United States, in which Northeastern states have been the most successful in the country. Such analysis may help European countries learn from each other's experiences and differing successes to develop the best policies to combat COVID-19 as a collective unit.Comment: Accepted manuscript. New appendix relative to v
    • …
    corecore